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[57] ABSTRACT 

An apparatus and method for searching for records of 
database items with incomplete or incorrectly provided 
input data. Database queries are automatically created 
and executed in a manner that has a high probability of 
selecting the correct record indicative of a desired item 
from a retrieved set of candidates. The queries comprise 
search expressions which are generated for supplement- 
ing each one of a series of input words comprising the 
input data. These search expressions include terms and 
phrases that are equivalent to each input word and also 
include expanded acronyms and abbreviations. When 
required, the search expressions further include words 
that are close to an input word when it appears to be 
misspelled. 

28 Claims, 3 Drawing Sheets 
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APPARATUS AND METHOD FOR FINDING 
RECORDS IN A DATABASE BY FORMULATING A 

QUERY USING EQUIVALENT TERMS WHICH 
CORRESPOND TO TERMS IN THE INPUT QUERY 5 

BACKGROUND OF THE INVENTION 

1. Technical Field 

This invention relates to data retrieval techniques and 
more particularly, to an apparatus and method for re- 10 
trieving a data record of an item or object in a database 
with limited or incomplete input data. 

2, Description of the Prior Art 

Service and product providers, when responding to 
customer requests, are required to locate specific re- 15 
cords in databases which identify an item or object 
sought by a customer. Often the information provided 
by the customer is not complete or partially incorrect, 
however. One such example of an operation wherein a 
customer provides information for requesting a specific 20 
item is that provided by a mail order information center. 
The customer may, for example, electronically submit 
an order for a specific book for which he or she has only 
partial information and the information center then has 
to locate the specific record of this book in its database 25 
before it can fill the order. Orders are most often en- 
tered by customers using a request program that is 
available on thousands of computers. There is no valida- 
tion for these requests but the information provided by 
the customers is entered in identified fields, such as title, 30 
author and the like. This information may consist of, 
singularly or in combination, arbitrary abbreviations, 
acronyms and even incorrect or misspelled words. 

Thus the task for the service or product provider is to 
find the item or object with the given information. This 35 
type of database searching is most often done by opera- 
tors who are familiar with the items or objects, the 
databases and also know how to resolve errors in the 
input data provided by the customers. Using known 
information retrieval techniques, a skilled operator can 40 
search the databases and mind the searches based on 
results obtained along the way. While this type of data- 
base searching has been found satisfactory where there 
are ample, available skilled operators, nevertheless, in 
order to be cost effective and still assure uniformity of 45 
results, it is desirable to automate the process of finding 
specific records of items or objects in databases. 

SUMMARY OF THE INVENTION 

In accordance with the invention, an database inter- 50 
rogation system is arranged for automatically creating 
database queries that have a high probability of finding 
the correct record of an item or object in an information 
database with limited or incomplete search information 
as input data. The input data typically comprises an 55 
input string of target words and the query is created by 
examining each one of the target words in the string. 

In accordance with one aspect of the invention, when 
requests are input into the system, the target words are 
examined and a set of search expressions is created from 60 
a search expression database. This database contains 
words, abbreviations, and acronyms equivalent to 
words in an identified field of the information database 
for items to be searched. By creating the set of search 
expressions, the database interrogation system supple- 65 
ments each term used by the customer with additional 
terms and phrases that provide an equivalent represen- 
tation of the term from the original request for increas- . 
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ing the likelihood of retrieving the correct record of the 
corresponding information database item. Once the set 
of search expressions is created, the search expressions 
are combined in ordered queries and executed in the 
assigned order against the information database. The 
records of items retrieved from the information data- 
base through the queries are evaluated in accordance 
with a predetermined parameter and the record of the 
item best fitting the original input string of target words 
is selected as the correct record. 

In accordance with another aspect of the invention, 
when one of the target words input into the system is 
examined and cannot be found in the search expression 
database, a second database is examined to generate a 
search expression of words from the information data- 
base. This search expression represents probable alter- 
native words that are close to the given target word 
which is most likely misspelled. The use of these alter- 
native words increases the likelihood of retrieving the 
correct record of the corresponding database item. 
Once these search expressions are created, they are 
combined with the other existing search expressions in 
ordered queries and executed in the assigned order 
against the information database. The records of items 
retrieved from the information database through these 
queries are evaluated and the record of the item best 
fitting the original input string of target words is se- 
lected as the correct record. 

BRIEF DESCRIPTION OF THE DRAWING 

This invention and its mode of operation will be more 
clearly understood from the following detailed descrip- 
tion when read with the appended drawing in which: 

FIG. 1 is a block diagram of an information display 
system for interrogating an information database in 
accordance with the invention; 

FIG. 2 is a flow chart depicting the process used in 
creating a set of databases of terms for obtaining search 
expressions, in accordance with the invention; and 

FIG. 3 shows a flow chart illustrating the operation 
of the information display system of FIG. 1 in creating 
and executing database queries in accordance with the 
invention. 

DETAILED DESCRIPTION 

Referring now to FIG. 1 of the drawing, there is 
shown a general block diagram of an information dis- 
play system employing the principles of the present 
invention. This information display system may be in an 
information center or library, for example, and is usu- 
ally located remotely with respect to computers from 
which it receives customer provided information. The 
elements employed in the information display system 
are computer 100, timing generator 115, video memory 
122, video controller 123, video display terminal 125, 
with a display screen 126, modem 130, and control 
interface modules 140, 141, and 142. 

The customer provided input to the system is pro- 
vided over a two-way communications line 131 to the 
modem 130. When a customer decides to order an item 
or object from the information center, inquiry com- 
mands from the computer 100 are provided. These com- 
mands prompt the customer to enter the requested in- 
formation in identified fields such as title and/or author 
when, for example, the item sought is a book. These 
inquiry commands from the computer 100 are sent to a 
remote computer or terminal (not shown) over the 
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communication line 131 and the requested information a list of all the author last names of the books is ex- 
is similarly returned over this line to the communica- tractcd from the item database, 
tions modem 130. The communications modem 130 and Each item database is used to generate search expres- 
communications line 131 also are used by the system for sions for an associated customer input field. Thus the 
automatically communicating an order request to an 5 author search expression database is used to find search 
appropriate provider after the record of an item or expressions for author names given by the customer and 
object has been found in a database query, described in the book title search expression database is used to find 
detail later herein. book titles given by the customer. The fields that are 
The customer provided information is coupled to the used for retrieving items from the item database are 
communications interface module 140 for input to the 10 extracted to generate one or more unique word lists in 
computer 100 or for input to a temporary disk storage ste P 215. 

medium 143, via the control modules 141 and 142, for More 111311 one item database may be used in the 
later retrieval by a system user performing a database search for a record of an item, in accordance with the 
search for a record representative of an item stored on invention. In the case of a search for the record repre- 
one or more of the databases interfaced to the system. 15 scntui S a title of a conference proceeding, for example, 
The peripheral control module 142 interfaces the an item database created for books is searched and if the 
computer 100 to the appropriate ones of a series of rccord havin S the desired title is not found, then a jour- 
databases, or data files, illustratively shown as databases nal mes database, for example, is also searched for the 
210, 211, 230, 231, 235, 236, 245 and 246, in accordance rccord havin S the desired title - ( Some conference pro- 
with the service or task being performed. It is to be 20 ceedin gs are published as monographs, and will appear . 
understood that other databases providing additional in a books ^ bas ^ while other conference proceed- 
services or tasks also may be simultaneously interfaced in $ s are P ublished ™ journals and will therefore appear 
with the computer 100. The peripheral control module *? a > urnals database.) The system may be arranged so 
142 also couples user provided input from a keyboard that ? ?™ number of multiple databases may be 
accessible by the system user to the computer 100. In- 25 searched for a record "V™***™ of a particular 
put/output (I/O) controller module 141 provides a data lte ^" . . . . . + . . . 
link between modules 140 and 142 and the processor , ™* w ^ ds contained in the unique word list gener- 

data bus 101 which connects to the computer 100. ate ?. at Ste P 21 * f e use , d w ^ *> fo ' 

Contained in the computer 100 are a data processor ln erat * g Searcl ? k T ^ ^[^IT'L^ 

104, random access memory (RAM) 105 and read only 30 Sea ? h c *P rcssl0ns f ™ ™* * Abbre- 
™ orT1rtr „ n>rk\M\ m< tu:/ \ / viations are generated in step 220 and illustratively 

dT ?nLnS^;?,f>o^,t P ^ r m * *°* ™thout limitation, the following. 

S E T ,nput . /0Ut P ut control mod ^ e 141 : **} the first three letters of the word 
2 < aCC f^T m0ry ^ 5 "2? !T d the first four letters of the word 

t h ™™ k ^^^^JSTf" 6 d8ta 35 *» first ,etter •*> the nKtt two ^sonants 

to the processor bus 101 for loading the video memory ^ firet , etter ^ ^ nejtt ^ consonants md ^ 

' 9 last consonant 

tnSTFFif ^ ? 6I !? 0ry ,i? Vide ° B y wa y <* example, *e word computer, may be 

™u il . r TOn r ° U « ^ 711,5 ^if 011 ^ abbreviated com, comp, cmp, cmptr, or cmpr. And the 
accepts iniormauon irom tne video memory 122 and ^ wor d commercial, may be abbreviated cmrnr, commerc 
p ovides it m a form suitable for displaymg on the dis- conun> cmcl ^ commert B sortin these 

fnt^r ? ♦ ^P lavtermlnal f 25 ^ * m " target wordsby their abbreviations, and then combining 
w fr su PPly'A/ signals to the video data ^ terms corresponding to each abbreviation, a search 

buss 121 and the processor data buss 101 are provided expression is created m step 225 that consists of a series 

oytmnng generator 115. 45 of or terms that contain the possible words that the 

With reference to FIG 2, there is shown a flow chart abbreviation can represent. Therefore, from the above 

depicting the process used in creating a set of databases example, the term com would have a search expression 

of terms for obtaining search expressions. These search 0 f 

expressions are generated for supplementing each target (computer or commercial) 

word originally used by the customer and, in a first 50 l n addition to the target words obtained from the 

search configuration, comprise terms and phrases that items database 210 and indexed for generating search 

contain the equivalent representation of the intended expressions, a database 230 of subject area acronyms 

term derived from the original target word. Through may be used in generating the search expressions in 
the addition of these terms and phrases, the likelihood of ste p 225. For example, the search expression for ATT in 

retrieving the correct record for a particular item is 55 the database 230 may be defined as 

considerably increased. In accordance with the inven- (ATT or AT&T or (american and telephone and 

tion, these search expressions include expanded aero- telegraph)) 

nyms and abbreviations and also, when required, an All searching is done without regard to case and all 
expression that represents words that are close to the letters are treated as lower case, 
target word when it appears to be misspelled. 60 In the create search expressions step 225, the search 
An information or item database 210 is created for expression for the search key (ATT in this example) is 
each of a plurality of customer input fields, such as book combined with other search expressions for the same 
title or author name in the case of a book or journal, by key. The resulting search expression may appear as 
taking all the data in each of the fields that will be used (ATT or AT&T or (american and telephone and 
for identification from the item database. In the process 65 telegraph) or att or (attention or attachment)), 
of building the search expression and trigram databases, Other words also could have att as their abbreviation, 
by way of example, a list of all the words in the journal They would similarly appear as part of the search ex- 
titles, a list of all of the words in the book titles, and/or pression. 
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Each record in the search expression database 235 To retrieve candidate words from the trigram data- 
will have: 1) a word that might be found in a request base, a search is generated as follows: 
from a system user (the database search key); and 2) a (first-trigram or last-trigram) and (second-trigram or 
search expression that represents alternatives that the third-trigram or ... ) 

search key (a word that may be input by a customer) 5 For short words the first digram, last digram and first 
could represent For example, the word j may have a letter are used in combinations depending on the word 
search expression length. The candidate words are evaluated by seeing 
(j or journal) how many character changes (replace, insert, or delete) 
The search keys and search expressions created in are needed to match the candidate word to the mis- 
step 225 for the particular input field are stored in a 10 spelled word. The search expression is created by using 
search expression database 235. Search expression data- the candidates that are within some specified limit of 
bases may exist for titles, authors and any other specific closeness as described with reference to FIG. 3. (The 
fields that are selected for examination. acceptable closeness value may vary by the field being 

The words contained in the unique word list gener- considered.) 
ated at step 215 are used as input to an algorithm for 15 Referring next to FIG. 3, there is shown a flow chart 
generating a database that will be used to try to correct illustrating the operation of the information display 
misspelled words (trigram database 245). One of these system of FIG. 1 when configured for performing data- 
databases is associated with each customer input field base queries. This system creates and executes database 
used to find the requested item. All the trigrams (sets of queries that have a high probability of finding the oor- 
three consecutive characters) of each unique word of at 20 rect record for an item or object in a database with 
least four characters are generated in step 245 and in- limited or incomplete search information being pro- 
dexed (with its associated unique word). In addition to vided as input data. The functions provided by data 
indexing the trigram field, each of the first and last processor 104 are advantageously determined by a pro- 
trigrams of each word are indexed with its associated cess or program stored in ROM memory 106, these 
unique word as separate indexed fields. For short 25 components being shown in and previously described 
words, the first and last digram (consecutive 2 charac- with reference to FIG. 1. 

ters) are also indexed. For very short words, the first The process is entered at step 301 where an input 

letter is also indexed with its associated unique word. string of target words, representing a received customer 

A record in the trigram database contains the unique request, is entered into the system for comparison. Each 

word and the following fields, each of which is indexed 30 word in the input string of target words is checked 

for searching: all the trigrams, the first trigram, the last against the known words (search keys) for the particu- 

trigram, the first digram (when applicable), the last lar input field in the search expression database in step 

digram (when applicable), the first letter (when applica- 302. 

ble). The record key (a number used to order the re- If at decision 302, a search expression is available for 

cords in the database) is generated in such a way as to 35 the input target word, the search expression retrieved at 

include information about the length of the unique step 303 is saved for use in step 306 and step 307. If, 
word. The search algorithm permits the restriction of however, a target word cannot be found in the search 

the search by record key range, thus the search against expression database, the target word is assumed to be 

this database is set for records whose words are close in misspelled and the process advances to step 304. At step 

length to the misspelled word in question. 40 304, the process, using the trigram database, generates a 

All words that contain at least four characters in the search expression of words that are close to the target 

unique word list are broken into trigrams. Thus, by way word for use in step 306 and step 307. The target word 

of example, the word computer will generate the tri- is also included in the search term in case the search 

grams expression database does not contain the complete set of 

com omp mpu put ute ter . 45 words. This may happen if, for example, the item data- 

For short words, additional grams are used: base gets updated frequently with new records of items 

for words less than eight characters, the first and last that are not similarly updated in the search expression 

digrams are generated and indexed database. 

. for words less than six characters, the first letter is Using both step 303 and step 304, each word in the 

indexed 50 input string of target words is examined and a search 

This additional information provided for short words expression is either retrieved from the search expression 

is beneficial since in such words, one incorrect letter database or generated using the trigram database. From 
could make both the first and last trigrams incorrect. If both steps 303 and 304, the process advances to decision 

trigrams were employed m these short words, when the 305 where the input string of target words is examined 

system looks for the set of candidates, the correct word 55 for additional words. If there is a word in the request 

might not be included in the candidate list. The only that has not been examined, the process returns to step 

candidates retrieved through this process are those 302. If all of the words in the request have been exam- 

whose length is close to the length of the misspelled ined, the process advances to decision 306. 

word in the request. At decision 306, a decision is made as to whether one 

As indicated previously herein, when a target word 60 or more search expressions are available. If no search 

cannot be found in the search expression database 235, expressions were obtained, the process advances to step 

the word is assumed to be misspelled. As a result, the 308 and the search is handed-off to a manual process, 

process generates a search expression from the trigram From step 308, the process is exited. If, however, one or 

database 245 with probable alternatives to misspelled more search expressions is obtained at this decision 306, 

target words input by the customer. Through use of this 65 the process advances to step 307. At this step 307, a set 
search expression, the likelihood of retrieving the cor- of ordered queries is created. These ordered queries 
rect record for a particular item is considerably in- comprise the search expressions combined in queries in 

creased. order of most restrictive to least restrictive, for advanta- 
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geously retrieving the fewest number of candidates 
from the item database. 

Each query is made as inclusive as possible to try to 
retrieve the desired candidate (high recall). At the same 
time, as many search expressions as possible are used to 5 
effectively narrow the list of candidates (high preci- 
sion). 

From step 307, the process advances to step 309 
where the queries are executed in the order of most 
restrictive to least restrictive. Thus the query contain- 10 
ing the most search expressions is executed first. If no 
candidates are found during the execution of the first 
one of the ordered queries, the next query in the order 
is executed, until one or more candidates is retrieved. 

As each query is executed, the results are handed to 15 
step 310 for evaluation. If a single candidate is retrieved 
with a high enough closeness value (step 311), the pro- 
cess is exited and the system automatically generates a 
request for the desired item, such as, in the case of a 
book, an order directly to the appropriate publisher. If 20 
a single candidate is not retrieved with a high enough 
closeness value, the candidates values are saved in a list 
in step 312 and a determination is made in step 313 as to 
there being another query to execute. If no queries are 
left to execute, the process is exited and the results of 25 
the candidates are reported. If additional queries remain 
unexecuted, the process returns to step 309 where the 
next one of these queries is executed. 

Each search is done against one or more databases, in 
accordance with the invention. For example, a request 30 
for a conference proceeding is run against a book titles 
database, as well as against a journals titles database 
because one can find conference proceedings in both 
databases. (Some conferences are published as journals 
while other are published as monographs.) 35 

Criteria such as exact word matching, ordered word 
prefix matching, and number of matched wordsA 
prefixes are used to assign an indicator for a closeness 
value to each candidate in step 310. 

For example, given the request for 40 

C prog, lang the initial search might retrieve the 
following titles: 

C notes: a guide to the C programming language. 

From Pascal to C: an introduction to the C program* 
ming language. 45 

Dr. Dobb's sourcebook: a reference guide to the C 
programming language. 

Complete C language programming for the IBM PC. 

The C programming language 

C programming language: an applied perspective 50 

Complete C language programming for the IBM 

Programming using the C language 

Before the searching is performed, stopwords, i.e., 
words that are not indexed nor used for searching, are 
removed from the requested title and the retrieved can- 55 
didates. Such stopwords are: among, all, an, and, are, as, 
at, be, been, between and the, for example. 

If all the words of the request match all the words of 
the candidate, the closeness value will be high. In the 
example provided, none of the candidates gets this 60 
value. When the evaluation algorithm matches the can- 
didates against the first four characters of each of the 
words, several candidates have all the words in order. 
Candidates that match all the words (parts of words) in 
order with few or no remaining words in the candidate 65 
will have a higher closeness value than candidates with 
more extra words. In this case, therefore, 

The C programming language 



is the highest valued candidate which is selected in 
accordance with step 311. If there are multiple highest 
value candidates, or the closeness values determined in 
step 310 are lower than some cutoff, however, all candi- 
dates are saved (step 312). Step 313 decides whether 
there is another query (the next less restrictive query) to 
perform. If there is, step 309 is executed. If not, (no 
more queries are available) the candidates and closeness 
values for the candidates are reported and the process is 
exited. 

There are other fields sometimes provided in the 
customer's request that can be used to reduce the num- 
ber of candidates. For example, the publisher might be 
entered for a book. Although this publisher information 
is sometimes entered by a customer, it can easily be 
input incorrectly so its preferred use is for verification 
or narrowing rather than for primary identification. 
Also there may be some local identification provided 
such as the key from the local online catalog. 

For illustrating the operation of the database interro- 
gation system in automatically finding the record of an 
item in a database, the following representative search 
examples are provided. The input fields have a tag at 
the start of the line. For book items, '*title:" for the title 
field and "author:" for the author field. For journal 
items, .jname is the journal name. 

In the search string, the ? is used as a truncation oper- 
ator-that is, all words beginning with the characters up 
to the ? are retrieved. 

Each search query contains afield restriction. This 
restricts the query so that only the given field is used in 
retrieving the items. (Vti" to restrict to the title field 
and "/au" to restrict to the author field.) The restriction 
will appear at the end of the search expression for 
which it applies. 

SEARCH 1 

Input: 

title: Linear Systems: A State Variable Approach 
With Numnerical Implementation author: Ray- 
mond A. DeCarlo 

Search Query: 

((( linear or lineare or linearen or lineari?) and sys- 
tem? and state? and variable? and approac? and 
(numerically or numberical or numerical or num- 
nerical) and implementat?)/ti) 

Found 

.bauthor DeCarlo, Raymond A., 

.btitle Linear systems: a state variable approach with 

numerical implementation 
In this example, the word Numnerical is misspelled. 
The search expression constructed for this word is 
(numerically or numberical or numerical or numneri- 
cal) 

The other search expressions are obtained from the 
search expression database. (Note also in this example 
that the word numberical, obtained from the trigram 
database is itself misspelled because the item database 
has misspellings.) 

SEARCH 2 

Input: 

title: Remembering the Katagana author: Morsbach- 

,Kurebayashi,Heisig 
Search Query: 

((heisig or morsbach)/au and (remember? and 

(katakana or katagana))/ti) 
Found 
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.bauthor Heisig, James W. expression database associated with the informa- 

.btitle Remembering The Hiragana: a complete tion database; 

course on how to teach yourself the Japanese sylla- generating a set of search expressions for each one of 

bary in 3 hours multiple ordered queries, each search expression 

In this example, the word Katagana is not correct. 5 including words from the search expression data- 

The search expression base for providing an equivalent representation of 

(katakana or katagana) one or more of the input string of target words; 

is constructed using the spell correction algorithm. It retrieving records from the information database, 

should be noted that the target word in the request is each record retrieved in each of the multiple or- 

still used in the expression in case the search expression 10 dered queries containing the set of search expres- 

database does not contain all of the item database sions respectively generated for one of the multiple 

words. ordered queries, the multiple ordered queries being 

For the author name search expression, the author arranged in order of a most restrictive query to a 

search expression database is used and only two of the least restrictive query; and 

three names are found, so those are the ones used, 15 selecting in accordance with a predetermined param- 

Note also in the closeness evaluation, subtitles, as eter a retrieved one of the records that best 

illustrated by the :a complete course on how to teach matches the input string of target words, 

yourself the Japanese syllabary in 3 hours on the .btitle 2. The method of claim 1 wherein the words in the 

line, are considered in the word count matching criteria, search expression database include expanded acronyms 

since the customer may request an item with or without 20 and abbreviations that are equivalent to associated 

the subtitle information. words stored in the information database. 

SEARCH 3 3 ' T * ie met h° d of claim 1 wherein the most restrictive 

query contains the most search expressions and the least 

Input: 25 restrictive query contains the fewest search expressions, 

.jname J . Comput. Sys. Sci. 4. The method of claim 3 wherein the ordered queries 

Search Query: are executed by the executing step for retrieving re- 

(0 or journal) cords from the information database, the executing step 

and (comput or computability or computacio? or including the steps of executing the most restrictive 

computado? or computat? or compute or com- 3Q query and, if no records are retrieved, executing the 

pute? or computed or computer? or computi?) next most restrictive query for retrieving records from 

and (seybold? or statphy? or superall? or survey? or the information database, 

symbiosi? or symmetr? or symposi or symposia 5. The method of claim 1 wherein the predetermined 

or synapse or syndrome? or synerget? or parameter comprises assigning a closeness value to each 

synergi? or synopse? or synopsi? or synthese? or 35 record found in accordance with the retrieving step, 

synthesi? or synthet? or syracu? or sys or sysm- 6. The method of claim 5 wherein the closeness value 

pos? or syst or syste or system? or systolic) assigned to each record is determined by evaluative 

and (sci or scie or scien or science? or scienti? or criteria, said criteria including exact word matching and 

scientometr? or scienza or scienze or scil or scin- ordered word prefix matching, 

tillat? or sciquest or scission))/ti ^ 7. The method of claim 6 further comprising the step 

Found of comparing the number of matched words and 

.jname Journal of Computer and System Sciences. matched word prefixes in each of the retrieved records 

It should be noted in this example that words Sys and with said words and word prefixes in the input string of 

Sci generate fairly large search expression; a person target words. 

would not likely use Sys for the word Seybolds ( sey- 45 8. The method of claim 7 wherein the highest close- 
bold?). The set retrieved for the term Sys will be larger ness value is assigned to the one of the retrieved records 
than really needed, but the extraneous items will be having the greater number of matching words and the 
eliminated when added with the other three sets, or fewer number of nonmatching words, 
later when narrowing down the final retrieved item 9. The method of claim 8 wherein the selecting step 
candidates. 50 further comprises the step of requesting the item corre- 
Although the invention has been specifically de- sponding to the retrieved one of the records from an 
scribed, it is obvious that many modifications and varia- item provider. 

tions of the present invention are possible in light of the 10. A method of retrieving a record for an item in an 

above teachings. For example, the present invention has information database in response to an input string of 

application wherever people enter "free form" or un- 55 target words, the method comprising the steps of: 

constrained requests for goods or information. Such is comparing each word contained in the input string of 

the case in any kind of ordering service where the num- target words with words contained in a first search 

ber of potential items is too large or too varied to sup- expression database associated with the informa- 

port a catalog with stock numbers. It is therefore to be tion database; 

understood that within the scope of the appended 60 generating a plurality of search expressions, each 

claims, the invention may be practiced otherwise than search expression including a word from the string 

as specifically described. of target words and words from the search expres- 

We claim: sion database for providing art equivalent represen- 

1. A method of retrieving a record for an item in an tation of one or more of the input string of target 

information database in response to an input string of 65 words, and one or more selected words close in 

target words, the method comprising the steps of: character content to one of the target words not 

comparing each word contained in the input string of located in the first search expression database, the 

target words with words contained in a search one or more selected words being obtained from a 
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second search expression database for supplement- 18. The method of claim 17 wherein the searching in 

ing said target word; other of the plurality of information databases is per- 

retrieving records from the information database, formed in a predetermined order, 

each record containing the words included in each 19 - ^ method of claim 17 wherein the predeter- 

of the search expressions* and 5 mined parameter comprises assigning a closeness value 

selecting in accordance with a predetermined param- to each record found in accordance with the retrieving 

eter a retrieved one of the records that best ste P* . - , . « A , . , M 

. , iL . * A ■ r * j 20. The method of claim 19 wherein the closeness 

matches the input string of target words ^ ^ d tQ ^ recQrd fe detennined b evalua . 

11 The method of claim 10 wherein the second w tive said criteria inc]uding cxact word match . 

search expression data base comprises a Ingram data- ^ g ^ ordered word pre fi x matching. 

base, the trigram database containing words that are 2 1. The method of claim 20 wherein the closeness 

close in character content to mispelled target words. value assigned to each record is further determined by 

12. The method of claim 11 wherein the predeter- the step of comparing the number of matched words 
mined parameter comprises assigning a closeness value 15 and matched word prefixes in each of the retrieved 
to each record found in accordance with the retrieving records with said words and word prefixes in the series 
step. of input words. 

13. The method of claim 12 wherein the closeness 22 - The method of claim 21 wherein the highest 
value assigned to each record is determined by evalua- closeness value is assigned to the one of the retrieved 
tive criteria, said criteria including exact word match- 20 ^cords having the greater number of matching words 
ing and ordered word prefix matching. niching prefixes and the fewer number of non- 
14. The method of claim 13 further comprising com- ma £hing words^ 

paring the number of matched words and matched . f 23 ^ yStC * f ° r retnCVmg a rC f rd °f ^T- m an , 

word prefixes in each of the retrieved records with said 25 " * 

, « , - . . . „ 25 target words, the system comprising: 

words and word prefixes m the input string of target means for comparing each W ord contained in the 

wor . input string of target words with words contained 

15. The method of claim 14 wherein the highest m a sear ch expression database associated with the 
closeness value is assigned to the one of the retrieved information database; 

records having the greater number of matching words 30 means for generating a set of search expressions for 

and the fewer number of nonmatching words. each one of multiple ordered queries, each search 

16. The method of claim 10 wherein the selecting step expression including words from the search expres- 
further comprises the step of requesting the item corre- sion database for providing an equivalent represen- 
sponding to the retrieved one of the records from an tation of one or more of the input string of target 
item provider. 35 words; 

17. A method of retrieving a record of an item from means for retrieving records from the information 
a plurality of information databases in response to a database, each record retrieved in each of the mi- 
series of input words, the method comprising the steps ti P ,e ordered queries containing the set of search 
G f. expressions respectively generated for one of the 

comparing each word contained in the series of input 40 multiple ordered queries, the multiple ordered 

words with words contained in a first and a second f meS bemg » °T der of a mos } reStnC " 

one of a plurality of search expression databases tlVC f Cfy /° I leaSt restnc * vc A 

. . / r \ rl F , **Tr, means for selecting in accordance with a predeter- 

associated with a first one of the plurality of infor- mined parame ter a retrieved one of the records that 

mation databases; 45 best matches the input $tring of Ux ^ t WOfds 

generating a set of search expressions, each search 24. The system of claim 23 wherein the words in the 
expression m the set including words from the first search expression database include expanded acronyms 
search expression database for providing an equiv- and abbreviations that are equivalent to associated 
alent representation of one or more of the series of words stored in the information database, 
input words and words from the second search 50 25. The system of claim 23 wherein the predeter- 
expression database for providing selected words mined parameter comprises a closeness value indicator 
close in character content to one or more of the assigned to each record retrieved in the information 
series of input words; database. 

searching in the first one of the plurality of informa- 26. The system of claim 25 wherein the closeness 
tion databases for retrieving records containing the 55 value indicator assigned to each record is detennined by 
search expressions; evaluative criteria, said criteria including exact word 

searching in other of the plurality of information ma J?^ 31141 orde ^ ed word P refw ma ^hing. 
databases for retrieving records containing the . 27. The system of claim 25 further comprising means 
search expressions when none of the retrieved re- m th ' number of matched words and 

cords in the first one of the plurality of information 60 S^^f!^" V r ^ ♦ ♦ , 

j«* w M u * * i_ *t_ • c • • > with said words and word prefixes m the input string of 

databases best matches the series of input words in target words 

accordance with a predetermined parameter; and 28. The system of claim 25 wherein the highest close- 
selecting in accordance with the predetermined pa- ness va i ue indicator is assigned to the one of the re- 
rameter the retrieved record from any of the plu- 65 trieved records having the greater number of matching 
rality of information databases that best matches words and the fewer number of nonmatching words, 
the series of input words. ***** 
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